y = β₀ + β₁x
Where:
β₀ is the intercept.
β₁ is the slope.
y = β₀ + β₁x + β₂x²
Where:
β₂ is the coefficient of the squared term.
The Curve:
The x² term introduces a curve into the relationship.
If β₂ is positive, the curve opens upward (like a U).
If β₂ is negative, the curve opens downward (like an inverted U).
# Descriptive statistics
Cleaned_KMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 2917182 | 470947.5 | 2233000 | 2549000 | 2907000 | 3277000 | 3630000 | ▇▅▅▅▅ |
Cleaned_KMA_Data %>% skim(IGF)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 76 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| IGF | 0 | 1 | 21745673 | 5157798 | 12025624 | 20538616 | 22708381 | 24445521 | 29377277 | ▃▁▆▇▃ |
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = IGF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
scale_x_continuous(labels = comma)
# Growth Rate (Percentage)
Cleaned_KMA_Data <- Cleaned_KMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
)
# Plot of Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF)) +
geom_point(aes(y = IGF), color = "dodgerblue") +
labs(title = "IGF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = IGF, color = "IGF")) +
geom_point(aes(y = IGF, color = "IGF")) +
labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = comma)
# Growth rate plots
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
scale_y_continuous(labels = percent_format(scale = 1)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero
The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 3,500, 000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which stable rise).
mod1 <- lm(IGF ~ Population, data = Cleaned_KMA_Data)
summary(mod1)
##
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6098470 -2859531 180262 2732570 8474201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6304153.559 9432943.752 0.668 0.521
## Population 5.293 3.196 1.656 0.132
##
## Residual standard error: 4760000 on 9 degrees of freedom
## Multiple R-squared: 0.2336, Adjusted R-squared: 0.1484
## F-statistic: 2.743 on 1 and 9 DF, p-value: 0.1321
Cleaned_KMA_Data %>%
ggplot(aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = scales::comma)
# The Quadratic Term
Cleaned_KMA_Data$Population_Squared <- Cleaned_KMA_Data$Population^2
# Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
summary(mod_quad)
##
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3567751 -1933315 -290773 1436917 5015707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -160892119.79269299 45252401.28199609 -3.555 0.00745 **
## Population 122.28092169 31.44603817 3.889 0.00462 **
## Population_Squared -0.00001998 0.00000536 -3.728 0.00580 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3051000 on 8 degrees of freedom
## Multiple R-squared: 0.72, Adjusted R-squared: 0.65
## F-statistic: 10.29 on 2 and 8 DF, p-value: 0.006144
ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
scale_y_continuous(labels = comma)
Linear Regression:
Coefficients:
Intercept: 6,304,153.559
Population: 5.293
P-values: Intercept: 0.521 (not significant) Population: 0.132 (not significant)
R-squared: Multiple R-squared: 0.2336 Adjusted R-squared: 0.1484
Interpretation: The linear model shows a weak and statistically insignificant relationship between population and IGF revenue. Population explains only about 23.36% of the variance in IGF.
Quadratic Regression:
Coefficients: Intercept: -160,892,119.79 Population: 122.28 Population_Squared: -0.00001998
P-values: All coefficients are highly statistically significant (p < 0.01).
R-squared: Multiple R-squared: 0.72 Adjusted R-squared: 0.65
Interpretation: The quadratic model shows a strong and statistically significant relationship between population and IGF revenue. The significant Population_Squared term confirms a non-linear (quadratic) relationship.
The R-squared of 0.72 indicates that the quadratic model explains 72% of the variance in IGF, which is a significant improvement over the linear model.
# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_KMA_Data) %>% summary()
##
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30907 -0.15091 0.01315 0.16298 0.37510
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1417 6.6600 0.322 0.7551
## Ln_Pop 0.9898 0.4477 2.211 0.0544 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2315 on 9 degrees of freedom
## Multiple R-squared: 0.3519, Adjusted R-squared: 0.2799
## F-statistic: 4.887 on 1 and 9 DF, p-value: 0.05438
# Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
geom_point() +
labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")
After the log transformation the log model showed a stronger relationship than the simple linear model and the relationship is marginally significant.The square root model is better than the simple linear model but not as good as the log model. The quadratic model provided the best fit among the models. The significant Population squared term confirms a non-linear relationship.
# Scatter Plot
ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF)) +
geom_point() +
labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
geom_point() + # Added geom_point()
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals(Linear)", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals")
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
# Durbin-Watson Test (Autocorrelation)
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 0.88786, df = 2, p-value = 0.6415
# Variance Inflation Factor (VIF) - Multicollinearity
vif(mod_quad)
## Population Population_Squared
## 235.5719 235.5719
Some of the assumptions are aviolated
# Centering Population Model
Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2
mod_quad_centered <- lm(IGF ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(mod_quad_centered)
##
## Call:
## lm(formula = IGF ~ Population_Centered + Population_Centered_Squared,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3567751 -1933315 -290773 1436917 5015707
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 25774690.24572124 1419247.31189146 18.161
## Population_Centered 5.69657280 2.05167508 2.777
## Population_Centered_Squared -0.00001998 0.00000536 -3.728
## Pr(>|t|)
## (Intercept) 0.0000000868 ***
## Population_Centered 0.0241 *
## Population_Centered_Squared 0.0058 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3051000 on 8 degrees of freedom
## Multiple R-squared: 0.72, Adjusted R-squared: 0.65
## F-statistic: 10.29 on 2 and 8 DF, p-value: 0.006144
# Diagnostic Tests on Centered Model
# Residuals vs. Fitted Values (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered), fitted = fitted(mod_quad_centered)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Centered Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Centered Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Centered Quadratic Model)")
# Durbin-Watson Test (Centered)
dwtest(mod_quad_centered)
##
## Durbin-Watson test
##
## data: mod_quad_centered
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Centered)
bptest(mod_quad_centered)
##
## studentized Breusch-Pagan test
##
## data: mod_quad_centered
## BP = 0.88786, df = 2, p-value = 0.6415
# VIF (Centered)
vif(mod_quad_centered)
## Population_Centered Population_Centered_Squared
## 1.002787 1.002787
Therefore from the analysis so far we found a strong, curved relationship between population and IGF revenue. The quadratic model (IGF ~ Population + Population_Squared) is the most appropriate for describing the relationship between Population and IGF it has ( p-value = 0.006144 and Multiple R-squared = 0.72) All the assumptions are met only autocorrelation remains this suggests the model may not fully capture time-related patterns. A larger sample may be able to resolve that.
Cleaned_KMA_Data %>% skim(Population)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 81 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Population | 0 | 1 | 2917182 | 470947.5 | 2233000 | 2549000 | 2907000 | 3277000 | 3630000 | ▇▅▅▅▅ |
Cleaned_KMA_Data %>% skim(DACF)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 81 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| DACF | 0 | 1 | 5446273 | 2142998 | 2523770 | 3107713 | 6274711 | 7244209 | 7396115 | ▅▂▁▃▇ |
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Population)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Population", x = "Population")
ggplot(Cleaned_KMA_Data, aes(x = DACF)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")
#Growth Rates and Per Capita Values
Cleaned_KMA_Data <- Cleaned_KMA_Data %>%
mutate(
Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
)
# Plotting Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = DACF)) +
geom_point(aes(y = DACF), color = "dodgerblue") +
labs(title = "DACF Trend", x = "Year", y = "IGF") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = DACF, color = "DACF")) +
geom_point(aes(y = DACF, color = "DACF")) +
labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
scale_y_continuous(labels = scales::comma)
# Plotting Growth Rates
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
geom_hline(yintercept = 0, linetype = "dashed", color = "red")
The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which had a stable rise).
mod2 <- lm(DACF ~ Population, data = Cleaned_KMA_Data)
summary(mod2)
##
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3266584 -1367838 20734 1109642 2600396
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1461779.744 3822900.230 -0.382 0.711
## Population 2.368 1.295 1.828 0.101
##
## Residual standard error: 1929000 on 9 degrees of freedom
## Multiple R-squared: 0.2708, Adjusted R-squared: 0.1898
## F-statistic: 3.343 on 1 and 9 DF, p-value: 0.1008
Cleaned_KMA_Data %>%
ggplot(aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = scales::comma)
# Quadratic Regression
mod_quad2 <- lm(DACF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
summary(mod_quad2)
##
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2563570 -863129 234425 991928 1959111
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52642251.321280584 24234543.657086555 -2.172 0.0616 .
## Population 38.179151978 16.840661785 2.267 0.0531 .
## Population_Squared -0.000006117 0.000002870 -2.131 0.0657 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1634000 on 8 degrees of freedom
## Multiple R-squared: 0.5349, Adjusted R-squared: 0.4186
## F-statistic: 4.6 on 2 and 8 DF, p-value: 0.04681
ggplot(Cleaned_KMA_Data, aes(x = Population, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and DACF Revenue") +
scale_y_continuous(labels = comma)
The linear model shows a weak and statistically insignificant relationship between population and IGF revenue. Population explains only about 27.08% of the variance in DACF.
The quadratic model shows a statistically significant relationship between population and DACF revenue. But the individual term and the Population_Squared term are not significant
#Scatter Plot
ggplot(Cleaned_KMA_Data, aes(x = Population, y = DACF)) +
geom_point() +
labs(title = "Population vs. DACF Revenue",
x = "Population", y = "DACF Revenue")
# Residual
ggplot(data = data.frame(residuals = residuals(mod2),
fitted = fitted(mod2)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted",
x = "Fitted Values", y = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals", x = "Residuals")
ggplot(data = data.frame(residuals = residuals(mod2)),
aes(sample = residuals)) +
stat_qq() +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals ")
# Autocorrelation
dwtest(mod2)
##
## Durbin-Watson test
##
## data: mod2
## DW = 1.6094, p-value = 0.1371
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod2)
##
## studentized Breusch-Pagan test
##
## data: mod2
## BP = 0.000024319, df = 1, p-value = 0.9961
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.
# Multivariate Normality
#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.
# Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Quadratic Model)")
# Durbin-Watson Test (Autocorrelation)
dwtest(mod_quad)
##
## Durbin-Watson test
##
## data: mod_quad
## DW = 1.3293, p-value = 0.01328
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Homoscedasticity)
bptest(mod_quad)
##
## studentized Breusch-Pagan test
##
## data: mod_quad
## BP = 0.88786, df = 2, p-value = 0.6415
# Variance Inflation Factor (VIF) - Multicollinearity
vif(mod_quad)
## Population Population_Squared
## 235.5719 235.5719
The scatter plot shows a positive but non-linear relationship. It shows that as population increases DACF revenue tends to increase as well. The histogram plot show a potential violation of the normality assumption. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.
#Transformed Models
lm(log(DACF) ~ log(Population), data = Cleaned_KMA_Data) %>%
summary()
#
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_KMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.67169 -0.25524 -0.00088 0.23629 0.55431
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -10.2034 11.4342 -0.892 0.3954
# log(Population) 1.7227 0.7687 2.241 0.0517 .
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 0.3974 on 9 degrees of freedom
# Multiple R-squared: 0.3582, Adjusted R-squared: 0.2869
# F-statistic: 5.023 on 1 and 9 DF, p-value: 0.05175
lm( sqrt(DACF)~sqrt(Population), data = Cleaned_KMA_Data ) %>%
summary()
#
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_KMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -736.39 -294.12 -16.42 253.97 594.30
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -1131.5392 1692.4528 -0.669 0.5205
# sqrt(Population) 2.0065 0.9909 2.025 0.0735 .
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 433.7 on 9 degrees of freedom
# Multiple R-squared: 0.313, Adjusted R-squared: 0.2366
# F-statistic: 4.1 on 1 and 9 DF, p-value: 0.07354
# Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Log(Population) vs. Log(DACF Revenue)",
x = "Log(Population)", y = "Log(DACF Revenue)")
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(DACF))) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")
The linear regression results earlier indicated that the relationship between population size and DAGF revenue is not statistically significant. From the log and square root models we did not find statistically significant relationships between Population and DACF Revenue.
# Centering Population Model
Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2
mod_quad_centered <- lm(DACF ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(mod_quad_centered)
##
## Call:
## lm(formula = DACF ~ Population_Centered + Population_Centered_Squared,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2563570 -863129 234425 991928 1959111
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 6679595.937710916 760065.984695505 8.788
## Population_Centered 2.491502754 1.098757369 2.268
## Population_Centered_Squared -0.000006117 0.000002870 -2.131
## Pr(>|t|)
## (Intercept) 0.0000221 ***
## Population_Centered 0.0531 .
## Population_Centered_Squared 0.0657 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1634000 on 8 degrees of freedom
## Multiple R-squared: 0.5349, Adjusted R-squared: 0.4186
## F-statistic: 4.6 on 2 and 8 DF, p-value: 0.04681
# Diagnostic Tests on Centered Model
# Residuals vs. Fitted Values (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered), fitted = fitted(mod_quad_centered)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Centered Quadratic Model)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Centered Quadratic Model)", x = "Residuals")
# Q-Q Plot of Residuals (Centered)
ggplot(data = data.frame(residuals = residuals(mod_quad_centered)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Centered Quadratic Model)")
# Durbin-Watson Test (Centered)
dwtest(mod_quad_centered)
##
## Durbin-Watson test
##
## data: mod_quad_centered
## DW = 2.585, p-value = 0.5951
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test (Centered)
bptest(mod_quad_centered)
##
## studentized Breusch-Pagan test
##
## data: mod_quad_centered
## BP = 1.8585, df = 2, p-value = 0.3949
# VIF (Centered)
vif(mod_quad_centered)
## Population_Centered Population_Centered_Squared
## 1.002787 1.002787
The centered quadratic model for DACF shows some evidence of a quadratic relationship with population. The F-statistic is statistically significant (p = 0.04681) for the overall model. But both Population_Centered and Population_Centered_Squared are only marginally significant. All regression assumptions of the Centered Quadratic Model are met.
# Calculate descriptive statistics
desc_stats <- Cleaned_KMA_Data %>%
summarize(
Population_mean = mean(Population),
Population_sd = sd(Population),
Population_min = min(Population),
Population_max = max(Population),
Capital_Expenditure_mean = mean(Capital_Expenditure),
Capital_Expenditure_sd = sd(Capital_Expenditure),
Capital_Expenditure_min = min(Capital_Expenditure),
Capital_Expenditure_max = max(Capital_Expenditure),
Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
Recrrent_Expenditure_min = min(Recrrent_Expenditure),
Recrrent_Expenditure_max = max(Recrrent_Expenditure)
)
cat("
## Descriptive Statistics
| Statistic | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
"|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
##
## ## Descriptive Statistics
##
## | Statistic | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean | 2,917,182 | 16,386,471 | 17,381,914 |
## | Standard Deviation | 470,948 | 13,818,549 | 4,197,344 |
## | Minimum | 2,233,000 | 6,278,840 | 8,979,764 |
## | Maximum | 3,630,000 | 46,223,724 | 24,001,764 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_KMA_Data, aes(x = Capital_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_KMA_Data, aes(x = Recrrent_Expenditure)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
scale_x_continuous(labels = comma)
# Population Histogram
pop_hist <- ggplot(Cleaned_KMA_Data, aes(x = Population)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
geom_density(color = "red") +
labs(title = "Distribution of Population", x = "Population", y = "Density") +
scale_x_continuous(labels = comma)
cap_hist
rec_hist
pop_hist
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population)) +
geom_point(aes(y = Population), color = "dodgerblue") +
labs(title = "Population Trend", x = "Year", y = "Population") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
# Calculate Per Capita Values
Cleaned_KMA_Data$Capital_Exp_Per_Capita <- Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Population
# Plotting Trends (Improved)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Population, color = "Population")) +
geom_point(aes(y = Population, color = "Population")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
theme(axis.title.y.right = element_text(vjust=2))
# Per Capita Analysis
average_capita <- mean(Cleaned_KMA_Data$Capital_Exp_Per_Capita)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
Cleaned_KMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_KMA_Data$Recrrent_Expenditure / Cleaned_KMA_Data$Population
average_rec_capita <- mean(Cleaned_KMA_Data$Recrrent_Exp_Per_Capita)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
scale_y_continuous(labels = comma)
mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_KMA_Data)
summary(mod3)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14004703 -7884170 -5442992 1911261 28795577
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35476333.333 28140934.259 1.261 0.239
## Population -6.544 9.534 -0.686 0.510
##
## Residual standard error: 14200000 on 9 degrees of freedom
## Multiple R-squared: 0.04974, Adjusted R-squared: -0.05585
## F-statistic: 0.4711 on 1 and 9 DF, p-value: 0.5098
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6389181 -2699416 -268187 2219372 7900222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8799112.358 8277023.496 1.063 0.315
## Population 2.942 2.804 1.049 0.321
##
## Residual standard error: 4176000 on 9 degrees of freedom
## Multiple R-squared: 0.109, Adjusted R-squared: 0.009972
## F-statistic: 1.101 on 1 and 9 DF, p-value: 0.3215
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(mod_cap)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14004703 -7884170 -5442992 1911261 28795577
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35476333.333 28140934.259 1.261 0.239
## Population -6.544 9.534 -0.686 0.510
##
## Residual standard error: 14200000 on 9 degrees of freedom
## Multiple R-squared: 0.04974, Adjusted R-squared: -0.05585
## F-statistic: 0.4711 on 1 and 9 DF, p-value: 0.5098
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(mod_rec)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6389181 -2699416 -268187 2219372 7900222
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8799112.358 8277023.496 1.063 0.315
## Population 2.942 2.804 1.049 0.321
##
## Residual standard error: 4176000 on 9 degrees of freedom
## Multiple R-squared: 0.109, Adjusted R-squared: 0.009972
## F-statistic: 1.101 on 1 and 9 DF, p-value: 0.3215
Cleaned_KMA_Data %>%
ggplot(aes(x = Population, y = Capital_Expenditure)) +
geom_point()+
geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
scale_y_continuous(labels = scales::comma)
Cleaned_KMA_Data %>%
ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
scale_y_continuous(labels = scales::comma)
From the linear regression results, the F-statistics and the p-values are not statistically significant for both . The analysis found no statistically significant linaer relationship between population, recurrent and capital expenditure. The relationship between capital expenditure and population is negative and non-linear but recurrent expenditure is positive and non-linear. Neither capital expenditure nor recurrent expenditure shows a strong or statistically significant relationship with population in their model and the low R-squared values indicate that population is not a good predictor of either type of expenditure. Given the linear models it cannot be concluded that changes in the population reliably predict changes in either of the expenditures, and any observed pattern could likely be due to chance.
dwtest(mod_cap)
##
## Durbin-Watson test
##
## data: mod_cap
## DW = 0.78624, p-value = 0.001835
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_rec)
##
## Durbin-Watson test
##
## data: mod_rec
## DW = 2.2187, p-value = 0.4983
## alternative hypothesis: true autocorrelation is greater than 0
# Autocorrelation
dwtest(mod3)
##
## Durbin-Watson test
##
## data: mod3
## DW = 0.9003, p-value = 0.004745
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)
bptest(mod_cap)
##
## studentized Breusch-Pagan test
##
## data: mod_cap
## BP = 0.67429, df = 1, p-value = 0.4116
bptest(mod_rec)
##
## studentized Breusch-Pagan test
##
## data: mod_rec
## BP = 3.1297, df = 1, p-value = 0.07688
bptest(mod3)
##
## studentized Breusch-Pagan test
##
## data: mod3
## BP = 3.9615, df = 1, p-value = 0.04655
From the above tests homoscedasticity is not present and autocorrelation is present, this means the models violate some of the regression assumptions.
Cleaned_KMA_Data$Ln_Population <- log(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Ln_Capital_Expenditure <- log(Cleaned_KMA_Data$Capital_Expenditure)
#Transformed Models
mod4 <- lm(log(Capital_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
summary(mod4)
#
# Call:
# lm(formula = log(Capital_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.8910 -0.4519 -0.2916 0.3138 1.2492
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 32.709 20.986 1.559 0.154
# log(Population) -1.100 1.411 -0.779 0.456
#
# Residual standard error: 0.7294 on 9 degrees of freedom
# Multiple R-squared: 0.06324, Adjusted R-squared: -0.04084
# F-statistic: 0.6076 on 1 and 9 DF, p-value: 0.4557
mod_rec_log <- lm(log(Recrrent_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
summary(mod_rec_log)
#
# Call:
# lm(formula = log(Recrrent_Expenditure) ~ log(Population), data = Cleaned_KMA_Data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.45074 -0.14941 -0.02363 0.17303 0.45815
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 6.1923 7.4054 0.836 0.425
# log(Population) 0.7024 0.4978 1.411 0.192
#
# Residual standard error: 0.2574 on 9 degrees of freedom
# Multiple R-squared: 0.1811, Adjusted R-squared: 0.09015
# F-statistic: 1.991 on 1 and 9 DF, p-value: 0.1919
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Log(Population) vs. Log(Capital Expenditure)",
x = "Log(Population)", y = "Log(Capital Expenditure)")
ggplot(Cleaned_KMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
x = "Log(Population)", y = "Log(Recurrent Expenditure)")
After the transformations none of the model is significant. Quadratic models are below.
Cleaned_KMA_Data$Recrrent_Expenditure_squared <- Cleaned_KMA_Data$Recrrent_Expenditure^2
Cleaned_KMA_Data$Capital_Expenditure_squared <- Cleaned_KMA_Data$Capital_Expenditure^2
mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_KMA_Data)
# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
##
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13671517 -6284597 -594718 5275778 20398369
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -370475527.52983952 170044595.95024911 -2.179 0.0610 .
## Population 277.50152126 118.16453279 2.348 0.0468 *
## Population_Squared -0.00004852 0.00002014 -2.409 0.0426 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11470000 on 8 degrees of freedom
## Multiple R-squared: 0.4492, Adjusted R-squared: 0.3116
## F-statistic: 3.263 on 2 and 8 DF, p-value: 0.09201
##
##
## Response Recrrent_Expenditure :
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4746979 -3030151 -413017 2141307 7879933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -40224855.249450110 63325308.704196945 -0.635 0.543
## Population 37.244339443 44.004959260 0.846 0.422
## Population_Squared -0.000005859 0.000007500 -0.781 0.457
##
## Residual standard error: 4270000 on 8 degrees of freedom
## Multiple R-squared: 0.1721, Adjusted R-squared: -0.03485
## F-statistic: 0.8316 on 2 and 8 DF, p-value: 0.4697
# Scatter Plots (Transformed Data)
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
scale_y_continuous(labels = comma)
The quadratic model results is significant for capital expenditure compared to the linear and the log models. But the overall p-value is not significant. This means that the relationship between population and capital expenditure is non-linear. For the recurrent expenditure neither the log model nor the quadratic model showed a statistically significant relationship. The relationship between population and capital expenditure is slightly better modeled with a quadratic function but the relationship between population and recurrent expenditure remains unclear.
Cleaned_KMA_Data$Population_Centered <- Cleaned_KMA_Data$Population - mean(Cleaned_KMA_Data$Population)
Cleaned_KMA_Data$Population_Centered_Squared <- Cleaned_KMA_Data$Population_Centered^2
# Quadratic Model
cap_exp_quad_mod <- lm(Capital_Expenditure ~ Population_Centered + Population_Centered_Squared, data = Cleaned_KMA_Data)
summary(cap_exp_quad_mod)
##
## Call:
## lm(formula = Capital_Expenditure ~ Population_Centered + Population_Centered_Squared,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13671517 -6284597 -594718 5275778 20398369
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 26168907.58408355 5333094.57326141 4.907
## Population_Centered -5.56479613 7.70956348 -0.722
## Population_Centered_Squared -0.00004852 0.00002014 -2.409
## Pr(>|t|)
## (Intercept) 0.00118 **
## Population_Centered 0.49097
## Population_Centered_Squared 0.04258 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11470000 on 8 degrees of freedom
## Multiple R-squared: 0.4492, Adjusted R-squared: 0.3116
## F-statistic: 3.263 on 2 and 8 DF, p-value: 0.09201
# Capital Expenditure Diagnostics (Quadratic)
# Residuals vs. Fitted
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod), fitted = fitted(cap_exp_quad_mod)),
aes(x = fitted, y = residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs. Fitted (Capital Expenditure - Quadratic)", x = "Fitted Values", y = "Residuals")
# Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod)), aes(x = residuals)) +
geom_histogram(bins = 10, fill = "skyblue", color = "black") +
labs(title = "Histogram of Residuals (Capital Expenditure - Quadratic)", x = "Residuals")
# Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(cap_exp_quad_mod)), aes(sample = residuals)) +
geom_point(stat = "qq") +
stat_qq_line() +
labs(title = "Q-Q Plot of Residuals (Capital Expenditure - Quadratic)")
# Durbin-Watson Test
dwtest(cap_exp_quad_mod)
##
## Durbin-Watson test
##
## data: cap_exp_quad_mod
## DW = 1.2356, p-value = 0.007197
## alternative hypothesis: true autocorrelation is greater than 0
# Breusch-Pagan Test
bptest(cap_exp_quad_mod)
##
## studentized Breusch-Pagan test
##
## data: cap_exp_quad_mod
## BP = 3.4371, df = 2, p-value = 0.1793
# VIF
vif(cap_exp_quad_mod)
## Population_Centered Population_Centered_Squared
## 1.002787 1.002787
Capital Expenditure Diagnostics (Quadratic) show that all the assumptions are met except autocorrelation, which might be caused by the sample size.
Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).
# Descriptive statistics
Cleaned_KMA_Data %>% skim(Capital_Exp_Per_Capita)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 87 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Capital_Exp_Per_Capita | 0 | 1 | 5.85 | 5.02 | 1.73 | 2.6 | 4.01 | 7.14 | 16.76 | ▇▁▁▁▁ |
Cleaned_KMA_Data %>% skim(TtRev_Growth_Rate)
| Name | Piped data |
| Number of rows | 11 |
| Number of columns | 87 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| TtRev_Growth_Rate | 1 | 0.91 | 5.33 | 20.53 | -27.39 | -9.3 | 2.22 | 20.6 | 40.94 | ▂▇▅▇▂ |
# Histograms
ggplot(Cleaned_KMA_Data, aes(x = Capital_Exp_Per_Capita)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
scale_x_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = TtRev_Growth_Rate)) +
geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
scale_x_continuous(labels = percent)
# Plotting Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
labs(
title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
x = "Year",
y = "Total Revenue Growth Rate (%)"
) +
scale_y_continuous(
labels = percent_format(scale = 1),
sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
) +
scale_color_manual(
values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
name = "Type"
) +
theme(axis.title.y.right = element_text(vjust = 2))
The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita( which remained stable).
mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_KMA_Data)
summary(mod5)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.743 -3.264 -2.444 2.431 10.004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.95376 1.79404 3.319 0.0106 *
## TtRev_Growth_Rate 0.03290 0.08883 0.370 0.7207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.472 on 8 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.01686, Adjusted R-squared: -0.106
## F-statistic: 0.1372 on 1 and 8 DF, p-value: 0.7207
ggplot(Cleaned_KMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE)+
labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
x = "Total Revenue Growth Rate (%)",
y = "Capital Expenditure Per Capita")
The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value (0.7207) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (0.01686) indicates only 1.69% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)
Cleaned_KMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_KMA_Data$Total_Expenditure) / Cleaned_KMA_Data$Total_Expenditure[-nrow(Cleaned_KMA_Data)]) * 100
mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_KMA_Data)
summary(mod6)
##
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.520 -3.215 -2.549 3.080 8.842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.58243 1.78006 3.136 0.0139 *
## Expenditure_Growth 0.04955 0.05656 0.876 0.4065
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.272 on 8 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.08753, Adjusted R-squared: -0.02652
## F-statistic: 0.7674 on 1 and 8 DF, p-value: 0.4065
ggplot(Cleaned_KMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
geom_point() + geom_smooth(method = "lm", se = TRUE)+
labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
x = "Expenditure Growth Rate (%)",
y = "Capital Expenditure Per Capita")
lm(log(Capital_Exp_Per_Capita) ~ Expenditure_Growth, data = Cleaned_KMA_Data) %>%
summary()
##
## Call:
## lm(formula = log(Capital_Exp_Per_Capita) ~ Expenditure_Growth,
## data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9846 -0.5061 -0.2824 0.6932 1.1417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.423110 0.273628 5.201 0.000822 ***
## Expenditure_Growth 0.008126 0.008695 0.935 0.377336
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8104 on 8 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09844, Adjusted R-squared: -0.01426
## F-statistic: 0.8735 on 1 and 8 DF, p-value: 0.3773
From the linear regression results there is no statistically significant relationship and even after the log transformation the results still remain non-significant.
# no variables
# Expenditure Composition:
Cleaned_KMA_Data$CapExp_Pct <- (Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Total_Expenditure)
Cleaned_KMA_Data$CapExp_Rev_Ratio <- (Cleaned_KMA_Data$Capital_Expenditure / Cleaned_KMA_Data$Total_Revenue)
# Expenditure Composition
ggplot(Cleaned_KMA_Data, aes(x = Year, y = CapExp_Pct)) +
geom_bar(stat = "identity", fill = "dodgerblue") +
geom_point()+
labs(title = "Capital Expenditure as Percentage of Total Expenditure",
x = "Year",
y = "Percentage") +
scale_y_continuous(labels = percent_format(accuracy = 1))
# Trends of Revenue and Expenditure over the years.
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_point(aes(y = Total_Revenue)) + # Added aes(y = Total_Revenue)
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_point(aes(y = Total_Expenditure)) + # Added aes(y = Total_Expenditure)
labs(title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)", color = "Type") +
scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
labs(
title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red"
)
) +
scale_y_continuous(labels = scales::comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_KMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1))
# CapExp_Rev_Ratio plot.
ggplot(Cleaned_KMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure to Total Revenue Ratio Over Years",
x = "Year",
y = "Ratio (Capital Expenditure/Total Revenue)"
) +
scale_y_continuous(labels = comma)
cor.test(Cleaned_KMA_Data$Total_Expenditure, Cleaned_KMA_Data$Total_Revenue)
##
## Pearson's product-moment correlation
##
## data: Cleaned_KMA_Data$Total_Expenditure and Cleaned_KMA_Data$Total_Revenue
## t = 10.303, df = 9, p-value = 0.000002788
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8495727 0.9898772
## sample estimates:
## cor
## 0.9601297
In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2016, followed by a sustained decline. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.
# Revenue Per Capita
Cleaned_KMA_Data$Total_Revenue_Per_Capita <- Cleaned_KMA_Data$Total_Revenue / Cleaned_KMA_Data$Population
Cleaned_KMA_Data$IGF_Per_Capita <- Cleaned_KMA_Data$IGF / Cleaned_KMA_Data$Population
Cleaned_KMA_Data$DACF_Per_Capita <- Cleaned_KMA_Data$DACF / Cleaned_KMA_Data$Population
# Time Series Plots (Improved)
# Total Revenue and Expenditure Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
geom_point(aes(y = DACF, color = "DACF")) +
geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
geom_point(aes(y = Others_Sources, color = "Other Sources")) +
labs(
title = "Revenue and Expenditure Trends Over Years",
x = "Year",
y = "Amount (Ghana Cedis)",
color = "Type"
) +
scale_color_manual(
values = c(
"Total Revenue" = "blue",
"Other Sources" = "skyblue",
"IGF" = "green",
"DACF" = "darkgray",
"Capital Expenditure" = "purple",
"Total Expenditure" = "red"
)
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# Population Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# IGF to Total Expenditure Ratio
ggplot(Cleaned_KMA_Data, aes(x = Year, y = IGF_TE)) +
geom_line(color = "steelblue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF to Total Expenditure Ratio Over Years",
x = "Year",
y = "Ratio (IGF/Total Expenditure)"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Per capita plot
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
scale_y_continuous(labels = comma)
cor_matrix <- cor(Cleaned_KMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
## Population Total_Revenue Total_Expenditure IGF_TE
## Population 1.0000000 0.1868282 0.1034625 0.44600059
## Total_Revenue 0.1868282 1.0000000 0.9601297 -0.43561569
## Total_Expenditure 0.1034625 0.9601297 1.0000000 -0.55086584
## IGF_TE 0.4460006 -0.4356157 -0.5508658 1.00000000
## CapExp_Pct -0.4403760 0.6182895 0.7034446 -0.50888391
## IGF 0.4833200 0.8707321 0.8079852 0.02758598
## CapExp_Pct IGF
## Population -0.4403760 0.48331996
## Total_Revenue 0.6182895 0.87073207
## Total_Expenditure 0.7034446 0.80798522
## IGF_TE -0.5088839 0.02758598
## CapExp_Pct 1.0000000 0.44219773
## IGF 0.4421977 1.00000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")
In the above there is a strong positive correlation between total revenue and total expenditure and alo between IGF.
# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_KMA_Data)
summary(model_revenue_pop)
##
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21596729 -6733191 -491577 5868454 23135274
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34528437.260 25284085.974 1.366 0.205
## Population 4.887 8.566 0.571 0.582
##
## Residual standard error: 12760000 on 9 degrees of freedom
## Multiple R-squared: 0.0349, Adjusted R-squared: -0.07233
## F-statistic: 0.3255 on 1 and 9 DF, p-value: 0.5823
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_KMA_Data)
summary(model_expenditure_pop)
##
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23753659 -8681700 -3938558 3341443 27290302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40109934.627 31720613.252 1.264 0.238
## Population 3.354 10.747 0.312 0.762
##
## Residual standard error: 16010000 on 9 degrees of freedom
## Multiple R-squared: 0.0107, Adjusted R-squared: -0.09922
## F-statistic: 0.09738 on 1 and 9 DF, p-value: 0.7621
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_KMA_Data)
summary(model_capital_rev_igf)
##
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11267026 -3857738 -1737562 5764288 12456890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4631203.8550 24413618.4027 -0.190 0.8543
## Total_Revenue 0.7929 0.2434 3.257 0.0116 *
## IGF_TE -39404383.9422 37091492.8899 -1.062 0.3191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8535000 on 8 degrees of freedom
## Multiple R-squared: 0.6948, Adjusted R-squared: 0.6185
## F-statistic: 9.105 on 2 and 8 DF, p-value: 0.008679
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_KMA_Data)
summary(model_igfte_pop_rev)
##
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.092402 -0.036195 -0.002448 0.030562 0.111567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.346718175591 0.142208087563 2.438 0.0407 *
## Population 0.000000093807 0.000000044637 2.102 0.0688 .
## Total_Revenue -0.000000003528 0.000000001706 -2.068 0.0725 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06531 on 8 degrees of freedom
## Multiple R-squared: 0.478, Adjusted R-squared: 0.3474
## F-statistic: 3.662 on 2 and 8 DF, p-value: 0.07427
# Visualizations
# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Total_Revenue)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = Total_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_KMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_KMA_Data, aes(x = Population, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
ggplot(Cleaned_KMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = percent_format(accuracy = 1))
In the regression results above, we found no significant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure and Total Revenue. However in between IGF_TE vs Population and Total Revenue. It was found that Total Revenue was significant.
# no variables
# IGF Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = IGF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "IGF Trend Over Years",
x = "Year",
y = "IGF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
# IGF and Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = IGF, color = "IGF"), size = 1) +
geom_point(aes(y = IGF, color = "IGF")) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_point(aes(y = Act_Permit, color = "Permit Fees")) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_point(aes(y = Act_Property_Rates, color = "Property Rates")) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_point(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue")) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_point(aes(y = Act_Licenses, color = "Licenses")) +
geom_line(aes(y = Act_Fees, color = "Act Fees"), size = 1) +
geom_point(aes(y = Act_Fees, color = "Act Fees")) +
labs(
title = "IGF vs. Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set1")+
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# IGF vs Land-Based Revenues
model_igf_land <- lm(IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
summary(model_igf_land)
##
## Call:
## lm(formula = IGF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## 4 5 6 7 8 9 10 11
## 488734 69604 188199 1134469 1144925 -1430427 -315928 -1279575
## attr(,"label")
## [1] "IGF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10514109.67108 12045471.58607 0.873 0.475
## Act_Permit 0.04273 2.18823 0.020 0.986
## Act_Property_Rates 1.87421 0.66688 2.810 0.107
## Act_Stool_Lands 2.11643 3.29037 0.643 0.586
## Act_Licenses 0.34264 0.87450 0.392 0.733
## Act_Fees 0.27096 1.08980 0.249 0.827
##
## Residual standard error: 1825000 on 2 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8707, Adjusted R-squared: 0.5474
## F-statistic: 2.693 on 5 and 2 DF, p-value: 0.2926
# Visualizations
# Scatter plots (IGF vs each land-based revenue)
ggplot(Cleaned_KMA_Data, aes(x = Act_Permit, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Permit Fees", x = "Permit Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Property_Rates, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Property Rates", x = "Property Rates", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Stool_Lands, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Licenses, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Licenses", x = "Licenses", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Fees, y = IGF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "IGF vs Act Fees", x = "Act Fees", y = "IGF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
cor_matrix_land_igf <- cor(Cleaned_KMA_Data[, c("IGF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_igf)
## IGF Act_Permit Act_Property_Rates Act_Stool_Lands
## IGF 1.0000000 -0.16911687 0.9073488 -0.09833270
## Act_Permit -0.1691169 1.00000000 -0.2004438 0.09591229
## Act_Property_Rates 0.9073488 -0.20044384 1.0000000 -0.30690790
## Act_Stool_Lands -0.0983327 0.09591229 -0.3069079 1.00000000
## Act_Licenses 0.4277484 0.15362045 0.4007133 -0.19224945
## Act_Fees 0.3145898 -0.39059567 0.2489881 0.20927846
## Act_Licenses Act_Fees
## IGF 0.4277484 0.3145898
## Act_Permit 0.1536204 -0.3905957
## Act_Property_Rates 0.4007133 0.2489881
## Act_Stool_Lands -0.1922495 0.2092785
## Act_Licenses 1.0000000 -0.1910590
## Act_Fees -0.1910590 1.0000000
corrplot(cor_matrix_land_igf)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, Act fees, licenses) and revenue (IGF) is not statistically significant with a high R-squared of 0.8707, means 87.07% of the variation in the IGF is explained by the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses). However the model is non-significant.
The correlation matrix shows that IGF is strongly correlated with Act property Rates
# Simple linear Regression Analysis
model_permit <- lm(IGF ~ Act_Permit, data = Cleaned_KMA_Data)
model_property <- lm(IGF ~ Act_Property_Rates, data = Cleaned_KMA_Data)
model_stool <- lm(IGF ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
model_license <- lm(IGF ~ Act_Licenses, data = Cleaned_KMA_Data)
model_acts <- lm(IGF ~ Act_Fees, data = Cleaned_KMA_Data)
summary(model_permit)
##
## Call:
## lm(formula = IGF ~ Act_Permit, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3732610 -1145232 -349172 590514 5206807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25326874.088 2863160.904 8.846 0.000116 ***
## Act_Permit -1.295 3.081 -0.420 0.688900
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2888000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.0286, Adjusted R-squared: -0.1333
## F-statistic: 0.1767 on 1 and 6 DF, p-value: 0.6889
summary(model_property)
##
## Call:
## lm(formula = IGF ~ Act_Property_Rates, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6630205 -1289675 812532 1495039 4614860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10325184.2217 3392269.9583 3.044 0.01393 *
## Act_Property_Rates 3.2066 0.9049 3.544 0.00628 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3513000 on 9 degrees of freedom
## Multiple R-squared: 0.5825, Adjusted R-squared: 0.5361
## F-statistic: 12.56 on 1 and 9 DF, p-value: 0.006277
summary(model_stool)
##
## Call:
## lm(formula = IGF ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3111405 -1735106 -517060 1167854 5068804
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24959913.626 3294423.878 7.576 0.000275 ***
## Act_Stool_Lands -1.135 4.690 -0.242 0.816811
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2916000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.009669, Adjusted R-squared: -0.1554
## F-statistic: 0.05858 on 1 and 6 DF, p-value: 0.8168
summary(model_license)
##
## Call:
## lm(formula = IGF ~ Act_Licenses, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4018843 -3037663 -132112 2708785 4768204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3798206.934 6725732.925 -0.565 0.58606
## Act_Licenses 3.515 0.915 3.841 0.00396 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3346000 on 9 degrees of freedom
## Multiple R-squared: 0.6212, Adjusted R-squared: 0.5791
## F-statistic: 14.76 on 1 and 9 DF, p-value: 0.003958
summary(model_acts)
##
## Call:
## lm(formula = IGF ~ Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4819866 -1158982 -130473 1655967 3448086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6037112.6763 2764259.4758 2.184 0.056804 .
## Act_Fees 2.0925 0.3547 5.900 0.000229 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2464000 on 9 degrees of freedom
## Multiple R-squared: 0.7946, Adjusted R-squared: 0.7717
## F-statistic: 34.81 on 1 and 9 DF, p-value: 0.0002291
The simple linear regression analysis of the land-based revenues found fees, licenses, property rates to be statistically significant , the rest (permit and stool lands) are not significant.
# DACF Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = DACF)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "DACF Trend Over Years",
x = "Year",
y = "DACF (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#DACF and Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = DACF, color = "DACF"), size = 1) +
labs(
title = "DACF vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# DACF vs Land-Based Revenues
model_DACF_land <- lm(DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
summary(model_DACF_land)
##
## Call:
## lm(formula = DACF ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## 4 5 6 7 8 9 10 11
## -120020 -153949 302564 15214 -208759 92085 -75089 147955
## attr(,"label")
## [1] "DACF"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10360261.1758 2136192.0104 4.850 0.0400 *
## Act_Permit 0.9798 0.3881 2.525 0.1275
## Act_Property_Rates 0.8016 0.1183 6.778 0.0211 *
## Act_Stool_Lands 4.5619 0.5835 7.818 0.0160 *
## Act_Licenses -0.9717 0.1551 -6.265 0.0245 *
## Act_Fees -0.3857 0.1933 -1.996 0.1841
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 323600 on 2 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.9828, Adjusted R-squared: 0.9397
## F-statistic: 22.8 on 5 and 2 DF, p-value: 0.04255
# Visualizations
# Scatter plots (DACF vs each land-based revenue)
ggplot(Cleaned_KMA_Data, aes(x = Act_Permit, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Permit Fees", x = "Permit Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Property_Rates, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Property Rates", x = "Property Rates", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Stool_Lands, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Licenses, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Licenses", x = "Licenses", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Fees, y = DACF)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "DACF vs Act Fees", x = "Act Fees", y = "DACF") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
cor_matrix_land_DACF <- cor(Cleaned_KMA_Data[, c("DACF", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_DACF)
## DACF Act_Permit Act_Property_Rates Act_Stool_Lands
## DACF 1.0000000 0.16620999 0.1600019 0.67970875
## Act_Permit 0.1662100 1.00000000 -0.2004438 0.09591229
## Act_Property_Rates 0.1600019 -0.20044384 1.0000000 -0.30690790
## Act_Stool_Lands 0.6797088 0.09591229 -0.3069079 1.00000000
## Act_Licenses -0.4367652 0.15362045 0.4007133 -0.19224945
## Act_Fees 0.1692270 -0.39059567 0.2489881 0.20927846
## Act_Licenses Act_Fees
## DACF -0.4367652 0.1692270
## Act_Permit 0.1536204 -0.3905957
## Act_Property_Rates 0.4007133 0.2489881
## Act_Stool_Lands -0.1922495 0.2092785
## Act_Licenses 1.0000000 -0.1910590
## Act_Fees -0.1910590 1.0000000
corrplot(cor_matrix_land_DACF)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (DACF) is statistically significant with a very high R-squared of 0.9828 and Adjusted R-squared of 0.9397 means a very good model and does fit. In terms of individual terms only permit and fees are not significant but the rest are.
The correlation matrix shows that DACF is weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(DACF ~ Act_Permit, data = Cleaned_KMA_Data)
model_property <- lm(DACF ~ Act_Property_Rates, data = Cleaned_KMA_Data)
model_stool <- lm(DACF ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
model_license <- lm(DACF ~ Act_Licenses, data = Cleaned_KMA_Data)
model_acts <- lm(DACF ~ Act_Fees, data = Cleaned_KMA_Data)
summary(model_permit)
##
## Call:
## lm(formula = DACF ~ Act_Permit, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2986267 -108409 314503 851847 1044571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5982409.5747 1391338.2521 4.300 0.00509 **
## Act_Permit 0.6182 1.4973 0.413 0.69405
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1403000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.02763, Adjusted R-squared: -0.1344
## F-statistic: 0.1705 on 1 and 6 DF, p-value: 0.694
summary(model_property)
##
## Call:
## lm(formula = DACF ~ Act_Property_Rates, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2688867 -1907169 -304689 1726427 2425100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2720415.1024 1960543.8659 1.388 0.199
## Act_Property_Rates 0.7654 0.5230 1.464 0.177
##
## Residual standard error: 2030000 on 9 degrees of freedom
## Multiple R-squared: 0.1922, Adjusted R-squared: 0.1025
## F-statistic: 2.142 on 1 and 9 DF, p-value: 0.1774
summary(model_stool)
##
## Call:
## lm(formula = DACF ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1590830 -489304 275105 582961 1232548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3976465.606 1179364.551 3.372 0.0150 *
## Act_Stool_Lands 3.811 1.679 2.270 0.0637 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1044000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.462, Adjusted R-squared: 0.3723
## F-statistic: 5.152 on 1 and 6 DF, p-value: 0.06368
summary(model_license)
##
## Call:
## lm(formula = DACF ~ Act_Licenses, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2757256 -1593466 88313 1594348 2761270
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -25687.5518 4148378.3002 -0.006 0.995
## Act_Licenses 0.7530 0.5644 1.334 0.215
##
## Residual standard error: 2064000 on 9 degrees of freedom
## Multiple R-squared: 0.1651, Adjusted R-squared: 0.07235
## F-statistic: 1.78 on 1 and 9 DF, p-value: 0.2149
summary(model_acts)
##
## Call:
## lm(formula = DACF ~ Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2378336 -574278 294565 583368 2019158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -434287.9786 1509706.6844 -0.288 0.78012
## Act_Fees 0.7833 0.1937 4.044 0.00291 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1346000 on 9 degrees of freedom
## Multiple R-squared: 0.645, Adjusted R-squared: 0.6056
## F-statistic: 16.35 on 1 and 9 DF, p-value: 0.002911
The simple linear regression analysis of the land-based revenues found none of them to be significant except fees.
# Capital_Expenditure Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = Capital_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Capital Expenditure Trend Over Years",
x = "Year",
y = "Capital_Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Capital_Expenditure, color = "Capital_Expenditure"), size = 1) +
labs(
title = "Capital Exp. vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_Capital_Expenditure_land <- lm(Capital_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
summary(model_Capital_Expenditure_land)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## 4 5 6 7 8 9 10
## 952.8 -2395936.4 6106784.5 5146179.4 1269866.6 -4540645.1 -2673595.8
## 11
## -2913605.8
## attr(,"label")
## [1] "Capital Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37081543.381 48366883.693 0.767 0.5234
## Act_Permit -2.638 8.787 -0.300 0.7924
## Act_Property_Rates 13.199 2.678 4.929 0.0388 *
## Act_Stool_Lands 4.564 13.212 0.345 0.7627
## Act_Licenses -5.613 3.511 -1.599 0.2510
## Act_Fees -3.077 4.376 -0.703 0.5548
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7327000 on 2 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.9372, Adjusted R-squared: 0.7801
## F-statistic: 5.966 on 5 and 2 DF, p-value: 0.1498
# Visualizations
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_KMA_Data, aes(x = Act_Permit, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Permit Fees", x = "Permit Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Property_Rates, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Property Rates", x = "Property Rates", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Stool_Lands, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Licenses, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Licenses", x = "Licenses", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Fees, y = Capital_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Capital_Expenditure vs Act Fees", x = "Act Fees", y = "Capital_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
cor_matrix_land_Capital_Expenditure <- cor(Cleaned_KMA_Data[, c("Capital_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Capital_Expenditure)
## Capital_Expenditure Act_Permit Act_Property_Rates
## Capital_Expenditure 1.0000000 -0.26523276 0.9191014
## Act_Permit -0.2652328 1.00000000 -0.2004438
## Act_Property_Rates 0.9191014 -0.20044384 1.0000000
## Act_Stool_Lands -0.2422814 0.09591229 -0.3069079
## Act_Licenses 0.1151535 0.15362045 0.4007133
## Act_Fees 0.2231508 -0.39059567 0.2489881
## Act_Stool_Lands Act_Licenses Act_Fees
## Capital_Expenditure -0.24228137 0.1151535 0.2231508
## Act_Permit 0.09591229 0.1536204 -0.3905957
## Act_Property_Rates -0.30690790 0.4007133 0.2489881
## Act_Stool_Lands 1.00000000 -0.1922495 0.2092785
## Act_Licenses -0.19224945 1.0000000 -0.1910590
## Act_Fees 0.20927846 -0.1910590 1.0000000
corrplot(cor_matrix_land_Capital_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is not statistically significant with p-value (0.1498), R-squared of 0.9372 and Adjusted R-squared of 0.7801 . However, the only inndividual term that is significant is property rates.
The correlation matrix shows that Capital_Expenditure shows weakly correlated with all the land-based revenues except Act_Property rates.
# Simple linear Regression Analysis
model_permit <- lm(Capital_Expenditure ~ Act_Permit, data = Cleaned_KMA_Data)
model_property <- lm(Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_KMA_Data)
model_stool <- lm(Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
model_license <- lm(Capital_Expenditure ~ Act_Licenses, data = Cleaned_KMA_Data)
model_acts <- lm(Capital_Expenditure ~ Act_Fees, data = Cleaned_KMA_Data)
summary(model_permit)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Permit, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15192242 -12497495 -3618395 7856784 27588096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29082944.27 16134727.72 1.803 0.122
## Act_Permit -11.70 17.36 -0.674 0.526
##
## Residual standard error: 16270000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.07035, Adjusted R-squared: -0.08459
## F-statistic: 0.454 on 1 and 6 DF, p-value: 0.5255
summary(model_property)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Property_Rates, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9071855 -3363387 336245 5656498 6925913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20071128.52 5849842.69 -3.431 0.007496 **
## Act_Property_Rates 10.24 1.56 6.560 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6058000 on 9 degrees of freedom
## Multiple R-squared: 0.827, Adjusted R-squared: 0.8078
## F-statistic: 43.04 on 1 and 9 DF, p-value: 0.0001039
summary(model_stool)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19985153 -7966134 -6354819 9217331 25795037
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29674641.00 18501546.98 1.604 0.160
## Act_Stool_Lands -16.11 26.34 -0.612 0.563
##
## Residual standard error: 16370000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.0587, Adjusted R-squared: -0.09818
## F-statistic: 0.3742 on 1 and 6 DF, p-value: 0.5632
summary(model_license)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Licenses, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15384646 -8603100 -1885291 3918196 26901380
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9803927.234 27912257.471 -0.351 0.733
## Act_Licenses 3.604 3.798 0.949 0.367
##
## Residual standard error: 13890000 on 9 degrees of freedom
## Multiple R-squared: 0.09097, Adjusted R-squared: -0.01003
## F-statistic: 0.9007 on 1 and 9 DF, p-value: 0.3674
summary(model_acts)
##
## Call:
## lm(formula = Capital_Expenditure ~ Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13613164 -10766544 -2042074 6390149 25312615
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -602953.801 15244823.805 -0.040 0.969
## Act_Fees 2.263 1.956 1.157 0.277
##
## Residual standard error: 13590000 on 9 degrees of freedom
## Multiple R-squared: 0.1295, Adjusted R-squared: 0.03276
## F-statistic: 1.339 on 1 and 9 DF, p-value: 0.2771
The simple linear regression analysis of the land-based revenues found only property rates to be significant, the rest were not.
# Capital_Expenditure Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = Recrrent_Expenditure)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Recurrent Expenditure Trend ",
x = "Year",
y = "Recurrent Expenditure (Ghana Cedis)"
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Capital_Expenditure and Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent_Expenditure"), size = 1) +
labs(
title = "Recurrent Exp. vs.Land-Based Revenue Trend",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Capital_Expenditure vs Land-Based Revenues
model_recurrent_Expenditure_land <- lm(Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
summary(model_recurrent_Expenditure_land)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit + Act_Property_Rates +
## Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## 4 5 6 7 8 9 10 11
## -485555 -664406 1330041 150825 -822559 293784 -350174 548045
## attr(,"label")
## [1] "Recrrent Expenditure"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15412608.2597 8923048.0374 -1.727 0.2263
## Act_Permit -3.4078 1.6210 -2.102 0.1703
## Act_Property_Rates -1.4063 0.4940 -2.847 0.1044
## Act_Stool_Lands 4.4304 2.4374 1.818 0.2108
## Act_Licenses 2.6104 0.6478 4.030 0.0564 .
## Act_Fees 2.1101 0.8073 2.614 0.1205
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1352000 on 2 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.9405, Adjusted R-squared: 0.7917
## F-statistic: 6.319 on 5 and 2 DF, p-value: 0.1422
# Visualizations
# Scatter plots (Capital_Expenditure vs each land-based revenue)
ggplot(Cleaned_KMA_Data, aes(x = Act_Permit, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Permit Fees", x = "Permit Fees", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Property_Rates, y = Recrrent_Expenditure))+
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Property Rates", x = "Property Rates", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Stool_Lands, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Licenses, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Licenses", x = "Licenses", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Fees, y = Recrrent_Expenditure)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Recurrent_Expenditure vs Act Fees", x = "Act Fees", y = "Recurrent_Expenditure") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
cor_matrix_land_recurrent_Expenditure <- cor(Cleaned_KMA_Data[, c("Recrrent_Expenditure", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_recurrent_Expenditure)
## Recrrent_Expenditure Act_Permit Act_Property_Rates
## Recrrent_Expenditure 1.0000000 -0.33921055 -0.1812179
## Act_Permit -0.3392105 1.00000000 -0.2004438
## Act_Property_Rates -0.1812179 -0.20044384 1.0000000
## Act_Stool_Lands 0.4602729 0.09591229 -0.3069079
## Act_Licenses 0.3314177 0.15362045 0.4007133
## Act_Fees 0.4718360 -0.39059567 0.2489881
## Act_Stool_Lands Act_Licenses Act_Fees
## Recrrent_Expenditure 0.46027287 0.3314177 0.4718360
## Act_Permit 0.09591229 0.1536204 -0.3905957
## Act_Property_Rates -0.30690790 0.4007133 0.2489881
## Act_Stool_Lands 1.00000000 -0.1922495 0.2092785
## Act_Licenses -0.19224945 1.0000000 -0.1910590
## Act_Fees 0.20927846 -0.1910590 1.0000000
corrplot(cor_matrix_land_recurrent_Expenditure)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, licenses) and revenue (Capital_Expenditure) is not statistically significant with p-value (0.1422), R-squared of 0.9405 and Adjusted R-squared of 0.7917 .
The correlation matrix shows that Capital_Expenditure shows weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Recrrent_Expenditure ~ Act_Permit, data = Cleaned_KMA_Data)
model_property <- lm(Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_KMA_Data)
model_stool <- lm(Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
model_license <- lm(Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_KMA_Data)
model_acts <- lm(Recrrent_Expenditure ~ Act_Fees, data = Cleaned_KMA_Data)
summary(model_permit)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Permit, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4174996 -1452643 -621570 1760658 4173820
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20031116.847 2983757.708 6.713 0.000531 ***
## Act_Permit -2.836 3.211 -0.883 0.411085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3009000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.1151, Adjusted R-squared: -0.03243
## F-statistic: 0.7801 on 1 and 6 DF, p-value: 0.4111
summary(model_property)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Property_Rates, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8302918 -2182824 411529 2151138 6647327
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17015137.928 4270616.467 3.984 0.00319 **
## Act_Property_Rates 0.103 1.139 0.090 0.92995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4422000 on 9 degrees of freedom
## Multiple R-squared: 0.0009073, Adjusted R-squared: -0.1101
## F-statistic: 0.008173 on 1 and 9 DF, p-value: 0.9299
summary(model_stool)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4150510 -1834712 273161 1826093 3333845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13698408.936 3208885.678 4.269 0.00527 **
## Act_Stool_Lands 5.802 4.568 1.270 0.25113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2840000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.2119, Adjusted R-squared: 0.08049
## F-statistic: 1.613 on 1 and 6 DF, p-value: 0.2511
summary(model_license)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Licenses, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4999784 -3136048 -465994 2689694 6968791
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5234470.162 7893138.010 0.663 0.524
## Act_Licenses 1.672 1.074 1.557 0.154
##
## Residual standard error: 3927000 on 9 degrees of freedom
## Multiple R-squared: 0.2121, Adjusted R-squared: 0.1246
## F-statistic: 2.423 on 1 and 9 DF, p-value: 0.154
summary(model_acts)
##
## Call:
## lm(formula = Recrrent_Expenditure ~ Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5849108 -3116094 -257623 2241853 7550731
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12478112.3070 4663859.2132 2.675 0.0254 *
## Act_Fees 0.6532 0.5984 1.092 0.3034
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4158000 on 9 degrees of freedom
## Multiple R-squared: 0.1169, Adjusted R-squared: 0.0188
## F-statistic: 1.192 on 1 and 9 DF, p-value: 0.3034
The simple linear regression analysis of the land-based revenues found none to be significant
# Population Trend
ggplot(Cleaned_KMA_Data, aes(x = Year, y = Population)) +
geom_line(color = "blue", size = 1) +
geom_point(size = 2.5) +
labs(
title = "Population Trend Over Years",
x = "Year",
y = "Population "
) +
scale_y_continuous(labels = comma) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(face = "bold")
)
# Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
labs(
title = "Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
#Population and Land-Based Revenue Trends
ggplot(Cleaned_KMA_Data, aes(x = Year)) +
geom_line(aes(y = Act_Permit, color = "Permit Fees"), size = 1) +
geom_line(aes(y = Act_Property_Rates, color = "Property Rates"), size = 1) +
geom_line(aes(y = Act_Stool_Lands, color = "Stool Lands Revenue"), size = 1) +
geom_line(aes(y = Act_Licenses, color = "Licenses"), size = 1) +
geom_line(aes(y = Act_Fees, color = "Act_Fees"), size = 1) +
geom_line(aes(y = Population, color = "Population"), size = 1) +
labs(
title = "Population vs.Land-Based Revenue Trends Over Years",
x = "Year",
y = "Revenue (Ghana Cedis)",
color = "Revenue Type"
) +
scale_y_continuous(labels = comma) +
theme(
legend.position = "right",
legend.title = element_text(face = "bold"),
plot.title = element_text(hjust = 0.5, face = "bold")
)
The above shows the trends relationships.
# Population vs Land-Based Revenues
model_Population_land <- lm(Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands + Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
summary(model_Population_land)
##
## Call:
## lm(formula = Population ~ Act_Permit + Act_Property_Rates + Act_Stool_Lands +
## Act_Licenses + Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## 4 5 6 7 8 9 10 11
## -96849 -7429 -53513 -238472 -230248 295510 69705 261296
## attr(,"label")
## [1] "Population"
## attr(,"format.spss")
## [1] "F8.0"
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 391135.1863 2481400.6040 0.158 0.889
## Act_Permit 0.2099 0.4508 0.466 0.687
## Act_Property_Rates -0.2567 0.1374 -1.868 0.203
## Act_Stool_Lands -0.3407 0.6778 -0.503 0.665
## Act_Licenses 0.2287 0.1801 1.270 0.332
## Act_Fees 0.2319 0.2245 1.033 0.410
##
## Residual standard error: 375900 on 2 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.6822, Adjusted R-squared: -0.1123
## F-statistic: 0.8587 on 5 and 2 DF, p-value: 0.6156
# Visualizations
# Scatter plots (Population vs each land-based revenue)
ggplot(Cleaned_KMA_Data, aes(x = Act_Permit, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Permit Fees", x = "Permit Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Property_Rates, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Property Rates", x = "Property Rates", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Stool_Lands, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Stool Lands Revenue", x = "Stool Lands Revenue", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Licenses, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Licenses", x = "Licenses", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
ggplot(Cleaned_KMA_Data, aes(x = Act_Fees, y = Population)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Population vs Act Fees", x = "Act Fees", y = "Population") +
scale_x_continuous(labels = comma) +
scale_y_continuous(labels = comma)
cor_matrix_land_Population <- cor(Cleaned_KMA_Data[, c("Population", "Act_Permit", "Act_Property_Rates", "Act_Stool_Lands", "Act_Licenses", "Act_Fees")], use = "complete.obs")
print(cor_matrix_land_Population)
## Population Act_Permit Act_Property_Rates Act_Stool_Lands
## Population 1.00000000 0.27028219 -0.5475932 0.07381541
## Act_Permit 0.27028219 1.00000000 -0.2004438 0.09591229
## Act_Property_Rates -0.54759316 -0.20044384 1.0000000 -0.30690790
## Act_Stool_Lands 0.07381541 0.09591229 -0.3069079 1.00000000
## Act_Licenses 0.19663860 0.15362045 0.4007133 -0.19224945
## Act_Fees 0.02579506 -0.39059567 0.2489881 0.20927846
## Act_Licenses Act_Fees
## Population 0.1966386 0.02579506
## Act_Permit 0.1536204 -0.39059567
## Act_Property_Rates 0.4007133 0.24898809
## Act_Stool_Lands -0.1922495 0.20927846
## Act_Licenses 1.0000000 -0.19105904
## Act_Fees -0.1910590 1.00000000
corrplot(cor_matrix_land_Population)
The multiple regression results of all the land-based revenues (permit fees, property rates, rents, stool lands revenue, act fees, licenses) and Population is not statistically significant with R-squared of 0.6156, and Adjusted R-squared of -0.1123 means a poor model and does not fit.
The correlation matrix shows that Population is very weakly correlated with all the land-based revenues.
# Simple linear Regression Analysis
model_permit <- lm(Population ~ Act_Permit, data = Cleaned_KMA_Data)
model_property <- lm(Population ~ Act_Property_Rates, data = Cleaned_KMA_Data)
model_stool <- lm(Population ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
model_license <- lm(Population ~ Act_Licenses, data = Cleaned_KMA_Data)
model_acts <- lm(Population ~ Act_Fees, data = Cleaned_KMA_Data)
summary(model_permit)
##
## Call:
## lm(formula = Population ~ Act_Permit, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -496450 -241057 14939 330345 361037
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2891373.6429 367542.1699 7.867 0.000223 ***
## Act_Permit 0.2720 0.3955 0.688 0.517361
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 370700 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.07305, Adjusted R-squared: -0.08144
## F-statistic: 0.4729 on 1 and 6 DF, p-value: 0.5174
summary(model_property)
##
## Call:
## lm(formula = Population ~ Act_Property_Rates, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -671576 -368321 -29895 367238 714142
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2870590.63093 479107.44700 5.992 0.000205 ***
## Act_Property_Rates 0.01308 0.12780 0.102 0.920713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 496100 on 9 degrees of freedom
## Multiple R-squared: 0.001163, Adjusted R-squared: -0.1098
## F-statistic: 0.01048 on 1 and 9 DF, p-value: 0.9207
summary(model_stool)
##
## Call:
## lm(formula = Population ~ Act_Stool_Lands, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -514057 -245300 -8773 258805 476424
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3052791.3272 433846.1301 7.037 0.000412 ***
## Act_Stool_Lands 0.1120 0.6176 0.181 0.862098
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 384000 on 6 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.005449, Adjusted R-squared: -0.1603
## F-statistic: 0.03287 on 1 and 6 DF, p-value: 0.8621
summary(model_license)
##
## Call:
## lm(formula = Population ~ Act_Licenses, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -382038 -286705 -165999 295201 726359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1067117.6287 778731.0750 1.370 0.2038
## Act_Licenses 0.2546 0.1059 2.403 0.0397 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 387500 on 9 degrees of freedom
## Multiple R-squared: 0.3908, Adjusted R-squared: 0.3231
## F-statistic: 5.774 on 1 and 9 DF, p-value: 0.03971
summary(model_acts)
##
## Call:
## lm(formula = Population ~ Act_Fees, data = Cleaned_KMA_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -472045 -220785 -35845 95724 618711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1742422.78828 380538.03480 4.579 0.00133 **
## Act_Fees 0.15649 0.04883 3.205 0.01074 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 339200 on 9 degrees of freedom
## Multiple R-squared: 0.533, Adjusted R-squared: 0.4811
## F-statistic: 10.27 on 1 and 9 DF, p-value: 0.01074
The simple linear regression analysis of the land-based revenues found none of them to be significant.
# no variables
# no variables